Goto

Collaborating Authors

 weighted loss function


Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss

arXiv.org Artificial Intelligence

This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection in SemEval-2025 Shared Task 11. Our approach addresses data imbalance by dynamically adjusting class weights, thereby enhancing performance on minority emotion classes without the computational burden of traditional resampling methods. We evaluate BERT, RoBERTa, and BART on the BRIGHTER dataset, using evaluation metrics such as Micro F1, Macro F1, ROC-AUC, Accuracy, and Jaccard similarity coefficients. The results demonstrate that the weighted loss function improves performance on high-frequency emotion classes but shows limited impact on minority classes. These findings underscore both the effectiveness and the challenges of applying this approach to imbalanced multi-label emotion detection.


Loss Functions for Predictor-based Neural Architecture Search

arXiv.org Artificial Intelligence

Evaluation is a critical but costly procedure in neural architecture search (NAS). Performance predictors have been widely adopted to reduce evaluation costs by directly estimating architecture performance. The effectiveness of predictors is heavily influenced by the choice of loss functions. While traditional predictors employ regression loss functions to evaluate the absolute accuracy of architectures, recent approaches have explored various ranking-based loss functions, such as pairwise and listwise ranking losses, to focus on the ranking of architecture performance. Despite their success in NAS, the effectiveness and characteristics of these loss functions have not been thoroughly investigated. In this paper, we conduct the first comprehensive study on loss functions in performance predictors, categorizing them into three main types: regression, ranking, and weighted loss functions. Specifically, we assess eight loss functions using a range of NAS-relevant metrics on 13 tasks across five search spaces. Our results reveal that specific categories of loss functions can be effectively combined to enhance predictor-based NAS. Furthermore, our findings could provide practical guidance for selecting appropriate loss functions for various tasks. We hope this work provides meaningful insights to guide the development of loss functions for predictor-based methods in the NAS community.


A Time-Series Data Augmentation Model through Diffusion and Transformer Integration

arXiv.org Artificial Intelligence

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS 1 A Time-Series Data Augmentation Model through Diffusion and Transformer Integration Y uren Zhang ID, Zhongnan Pu ID, Lei jing ID Member,IEEE Abstract --With the development of Artificial Intelligence, numerous real-world tasks have been accomplished using technology integrated with deep learning. T o achieve optimal performance, deep neural networks typically require large volumes of data for training. Although advances in data augmentation have facilitated the acquisition of vast datasets, most of this data is concentrated in domains like images and speech. However, there has been relatively less focus on augmenting time-series data. T o address this gap and generate a substantial amount of time-series data, we propose a simple and effective method that combines the Diffusion and Transformer models. By utilizing an adjusted diffusion denoising model to generate a large volume of initial time-step action data, followed by employing a Transformer model to predict subsequent actions, and incorporating a weighted loss function to achieve convergence, the method demonstrates its effectiveness. Using the performance improvement of the model after applying augmented data as a benchmark, and comparing the results with those obtained without data augmentation or using traditional data augmentation methods, this approach shows its capability to produce high-quality augmented data. I NTRODUCTION W ITH the development of artificial intelligence (AI), numerous tasks in the real world have been accomplished through technologies combined with deep learning. Typically, a neural network that exhibits excellent performance requires a substantial amount of data for training. V arious types of multi-modal data, such as images, speech, and audio, can now be easily obtained from the Internet. The acquisition of these types of data is no longer an issue. However, due to privacy concerns, costs, and other factors, not all types of data can reach the scale of image or other types data. For instance, the data scale of rare diseases often remains relatively small [1], [2].


Custom Loss Functions in Fuel Moisture Modeling

arXiv.org Machine Learning

Fuel moisture content (FMC) is a key predictor for wildfire rate of spread (ROS). Machine learning models of FMC are being used more in recent years, augmenting or replacing traditional physics-based approaches. Wildfire rate of spread (ROS) has a highly nonlinear relationship with FMC, where small differences in dry fuels lead to large differences in ROS. In this study, custom loss functions that place more weight on dry fuels were examined with a variety of machine learning models of FMC. The models were evaluated with a spatiotemporal cross-validation procedure to examine whether the custom loss functions led to more accurate forecasts of ROS. Results show that the custom loss functions improved accuracy for ROS forecasts by a small amount. Further research would be needed to establish whether the improvement in ROS forecasts leads to more accurate real-time wildfire simulations.


A Data Fusion Framework for Multi-Domain Morality Learning

arXiv.org Artificial Intelligence

Language models can be trained to recognize the moral sentiment of text, creating new opportunities to study the role of morality in human life. As interest in language and morality has grown, several ground truth datasets with moral annotations have been released. However, these datasets vary in the method of data collection, domain, topics, instructions for annotators, etc. Simply aggregating such heterogeneous datasets during training can yield models that fail to generalize well. We describe a data fusion framework for training on multiple heterogeneous datasets that improve performance and generalizability. The model uses domain adversarial training to align the datasets in feature space and a weighted loss function to deal with label shift. We show that the proposed framework achieves state-of-the-art performance in different datasets compared to prior works in morality inference.


Confidence May Cheat: Self-Training on Graph Neural Networks under Distribution Shift

arXiv.org Artificial Intelligence

Graph Convolutional Networks (GCNs) have recently attracted vast interest and achieved state-of-the-art performance on graphs, but its success could typically hinge on careful training with amounts of expensive and time-consuming labeled data. To alleviate labeled data scarcity, self-training methods have been widely adopted on graphs by labeling high-confidence unlabeled nodes and then adding them to the training step. In this line, we empirically make a thorough study for current self-training methods on graphs. Surprisingly, we find that high-confidence unlabeled nodes are not always useful, and even introduce the distribution shift issue between the original labeled dataset and the augmented dataset by self-training, severely hindering the capability of self-training on graphs. To this end, in this paper, we propose a novel Distribution Recovered Graph Self-Training framework (DR-GST), which could recover the distribution of the original labeled dataset. Specifically, we first prove the equality of loss function in self-training framework under the distribution shift case and the population distribution if each pseudo-labeled node is weighted by a proper coefficient. Considering the intractability of the coefficient, we then propose to replace the coefficient with the information gain after observing the same changing trend between them, where information gain is respectively estimated via both dropout variational inference and dropedge variational inference in DR-GST. However, such a weighted loss function will enlarge the impact of incorrect pseudo labels. As a result, we apply the loss correction method to improve the quality of pseudo labels. Both our theoretical analysis and extensive experiments on five benchmark datasets demonstrate the effectiveness of the proposed DR-GST, as well as each well-designed component in DR-GST.


Not all Failure Modes are Created Equal: Training Deep Neural Networks for Explicable (Mis)Classification

arXiv.org Machine Learning

Deep Neural Networks are often brittle on image classification tasks and known to misclassify inputs. While these misclassifications may be inevitable, all failure modes cannot be considered equal. Certain misclassifications (eg. classifying the image of a dog to an airplane) can create surprise and result in the loss of human trust in the system. Even worse, certain errors (eg. a person misclassified as a primate) can have societal impacts. Thus, in this work, we aim to reduce inexplicable errors. To address this challenge, we first discuss how to obtain the class-level semantics that captures the human's expectation ($M^h$) regarding which classes are semantically close vs. ones that are far away. We show that for data-sets like CIFAR-10 and CIFAR-100, class-level semantics can be obtained by leveraging human subject studies (significantly inexpensive compared to existing works) and, whenever possible, by utilizing publicly available human-curated knowledge. Second, we propose the use of Weighted Loss Functions to penalize misclassifications by the weight of their inexplicability. Finally, we show that training (or even fine-tuning) existing classifiers with the two proposed methods lead to Deep Neural Networks that have (1) comparable top-1 accuracy, an important metric in operational contexts, (2) more explicable failure modes and (3) require significantly less cost in teams of additional human labels compared to existing work.


Improvement of Batch Normalization in Imbalanced Data

arXiv.org Artificial Intelligence

In this study, we consider classification problems based on neural networks in data-imbalanced environment. Learning from an imbalanced data set is one of the most important and practical problems in the field of machine learning. A weighted loss function based on cost-sensitive approach is a well-known effective method for imbalanced data sets. We consider a combination of weighted loss function and batch normalization (BN) in this study. BN is a powerful standard technique in the recent developments in deep learning. A simple combination of both methods leads to a size-mismatch problem due to a mismatch between interpretations of effective size of data set in both methods. We propose a simple modification to BN to correct the size-mismatch and demonstrate that this modified BN is effective in data-imbalanced environment.


Matrix denoising for weighted loss functions and heterogeneous signals

arXiv.org Machine Learning

We consider the problem of recovering a low-rank matrix from a noisy observed matrix. Previous work has shown that the optimal method for recovery depends crucially on the choice of loss function. We use a family of weighted loss functions, which arise naturally in many settings such as heteroscedastic noise and missing data. Weighted loss functions are challenging to analyze because they are not orthogonally-invariant. We derive optimal spectral denoisers for these weighted loss functions. By combining different weights, we then use these optimal denoisers to construct a new denoiser that exploits heterogeneity in the signal matrix for more accurate recovery with unweighted loss.